Although DETR-based 3D detectors can simplify the detection pipeline and achieve direct sparse predictions, their performance still lags behind dense detectors with post-processing for 3D object detection from point clouds. DETRs usually adopt a larger number of queries than GTs (e.g., 300 queries v.s. 40 objects in Waymo) in a scene, which inevitably incur many false positives during inference. In this paper, we propose a simple yet effective sparse 3D detector, named Query Contrast Voxel-DETR (ConQueR), to eliminate the challenging false positives, and achieve more accurate and sparser predictions. We observe that most false positives are highly overlapping in local regions, caused by the lack of explicit supervision to discriminate locally similar queries. We thus propose a Query Contrast mechanism to explicitly enhance queries towards their best-matched GTs over all unmatched query predictions. This is achieved by the construction of positive and negative GT-query pairs for each GT, and a contrastive loss to enhance positive GT-query pairs against negative ones based on feature similarities. ConQueR closes the gap of sparse and dense 3D detectors, and reduces up to ~60% false positives. Our single-frame ConQueR achieves new state-of-the-art (sota) 71.6 mAPH/L2 on the challenging Waymo Open Dataset validation set, outperforming previous sota methods (e.g., PV-RCNN++) by over 2.0 mAPH/L2.
translated by 谷歌翻译
In recent years, semi-supervised graph learning with data augmentation (DA) is currently the most commonly used and best-performing method to enhance model robustness in sparse scenarios with few labeled samples. Differing from homogeneous graph, DA in heterogeneous graph has greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature that formed by the non-uniformity distribution and strong clustering in complex graph. To address these challenges, this paper presents a novel method named Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation (HG-MDA). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of heterogeneous graph. And meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. In order to effectively fuse the prediction results of various DA strategies, the sharpening is used. Existing experiments on public datasets, i.e., ACM, DBLP, OGB, and industry dataset MB show that HG-MDA outperforms current SOTA models. Additionly, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.
translated by 谷歌翻译
预测交通参与者的多模式未来行为对于机器人车辆做出安全决策至关重要。现有作品探索以直接根据潜在特征预测未来的轨迹,或利用密集的目标候选者来识别代理商的目的地,在这种情况下,由于所有运动模式均来自相同的功能,而后者的策略具有效率问题,因此前者策略的收敛缓慢,因为其性能高度依赖关于候选目标的密度。在本文中,我们提出了运动变压器(MTR)框架,该框架将运动预测模拟为全球意图定位和局部运动改进的联合优化。 MTR不使用目标候选者,而是通过采用一系列可学习的运动查询对来结合空间意图。每个运动查询对负责特定运动模式的轨迹预测和完善,这可以稳定训练过程并促进更好的多模式预测。实验表明,MTR在边际和联合运动预测挑战上都达到了最新的性能,在Waymo Open Motion DataSet排行榜上排名第一。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
在本报告中,我们介绍了2022 Waymo Open DataSet挑战中运动预测轨迹的第一名解决方案。我们为多模式运动预测提出了一个新型的运动变压器框架,该框架引入了一组新型运动查询对,用于通过共同执行意图定位和迭代运动改进来产生更好的多模式未来轨迹。采用了一种简单的模型合奏策略,并采用了非最大抑制作用,以进一步提高最终性能。我们的方法在2022 Waymo打开数据集挑战的运动预测排行榜上取得了第一名,优于其他利润率的其他方法。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
近年来,自主驾驶一直在受到越来越多的关注,因为它的潜力减轻了驾驶员的负担并提高驾驶的安全性。在现代的自动驾驶管道中,感知系统是必不可少的组件,旨在准确估计周围环境的状态,并为预测和计划提供可靠的观察。 3D对象检测可以智能预测自动驾驶汽车附近关键3D对象的位置,大小和类别,是感知系统的重要组成部分。本文回顾了自动驾驶的3D对象检测的进展。首先,我们介绍3D对象检测的背景,并讨论此任务中的挑战。其次,我们从模型和感觉输入的各个方面(包括基于激光雷达,基于摄像头和多模式检测方法)对3D对象检测的进度进行了全面调查。我们还对每类方法中的潜力和挑战提供了深入的分析。此外,我们系统地研究了3D对象检测在驾驶系统中的应用。最后,我们对3D对象检测方法进行了性能分析,并进一步总结了多年来的研究趋势,并向前景提供了该领域的未来方向。
translated by 谷歌翻译
准确可靠的3D检测对于包括自动驾驶车辆和服务机器人在内的许多应用至关重要。在本文中,我们提出了一个具有点云序列的3D时间对象检测的灵活且高性能的3D检测框架,称为MPPNET。我们提出了一个新颖的三级结构框架,其中包含多帧特征编码和相互作用的代理点,以实现更好的检测。这三个层次结构分别进行每个帧的特征编码,短片特征融合和整个序列特征聚合。为了使用合理的计算资源来处理长期序列云,提出了组内特征混合和组间特征的注意,以形成第二和第三个特征编码层次结构,这些层次结构均经常应用于聚集多框架轨迹特征。代理不仅可以充当每个帧的一致对象表示,而且还充当了方便框架之间特征交互的快递。大型Waymo打开数据集的实验表明,当应用于短(例如4框架)和长(例如16框架)点云序列时,我们的方法优于具有较大边缘的最先进方法。代码可在https://github.com/open-mmlab/openpcdet上找到。
translated by 谷歌翻译
由于其在各种领域的广泛应用,3D对象检测正在接受行业和学术界的增加。在本文中,我们提出了从点云的3D对象检测的基于角度基于卷曲区域的卷积神经网络(PV-RCNNS)。首先,我们提出了一种新颖的3D探测器,PV-RCNN,由两个步骤组成:Voxel-to-keyPoint场景编码和Keypoint-to-Grid ROI特征抽象。这两个步骤深入地将3D体素CNN与基于点的集合的集合进行了集成,以提取辨别特征。其次,我们提出了一个先进的框架,PV-RCNN ++,用于更高效和准确的3D对象检测。它由两个主要的改进组成:有效地生产更多代表性关键点的划分的提案中心策略,以及用于更好地聚合局部点特征的vectorpool聚合,具有更少的资源消耗。通过这两种策略,我们的PV-RCNN ++比PV-RCNN快2倍,同时还在具有150米* 150M检测范围内的大型Waymo Open DataSet上实现更好的性能。此外,我们提出的PV-RCNNS在Waymo Open DataSet和高竞争力的基蒂基准上实现最先进的3D检测性能。源代码可在https://github.com/open-mmlab/openpcdet上获得。
translated by 谷歌翻译
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks. Specifically, the proposed framework summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction module to save follow-up computations and also to encode representative scene features. Given the highquality 3D proposals generated by the voxel CNN, the RoIgrid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points via keypoint set abstraction with multiple receptive fields. Compared with conventional pooling operations, the RoI-grid feature points encode much richer context information for accurately estimating object confidences and locations. Extensive experiments on both the KITTI dataset and the Waymo Open dataset show that our proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins by using only point clouds. Code is available at https://github.com/open-mmlab/OpenPCDet.
translated by 谷歌翻译
3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part-A 2 net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part-A 2 net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data. Code is available at https://github.com/sshaoshuai/PointCloudDet3D.
translated by 谷歌翻译
In this paper, we propose PointRCNN for 3D object detection from raw point cloud. The whole framework is composed of two stages: stage-1 for the bottom-up 3D proposal generation and stage-2 for refining proposals in the canonical coordinates to obtain the final detection results. Instead of generating proposals from RGB image or projecting point cloud to bird's view or voxels as previous methods do, our stage-1 sub-network directly generates a small number of high-quality 3D proposals from point cloud in a bottom-up manner via segmenting the point cloud of the whole scene into foreground points and background. The stage-2 sub-network transforms the pooled points of each proposal to canonical coordinates to learn better local spatial features, which is combined with global semantic features of each point learned in stage-1 for accurate box refinement and confidence prediction. Extensive experiments on the 3D detection benchmark of KITTI dataset show that our proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input. The code is available at https://github.com/sshaoshuai/PointRCNN.
translated by 谷歌翻译